Using phone log-likelihood ratios as features for speaker recognition
نویسندگان
چکیده
The so called Phone Log-Likelihood Ratio (PLLR) features, computed on phone posterior probabilities provided by phonetic decoders, convey acoustic-phonetic information in a sequence of frame-level vectors. Thus, PLLRs can be easily plugged into traditional acoustic systems just by replacing MFCCs, PLPs or whatever other representation. PLLR features were used under an iVector-PLDA approach in our submission to the NIST 2012 Speaker Recognition Evaluation (SRE). In this work, we present a report of the goodness of these features for speaker recognition. Results on the telephone clean speech condition of the NIST 2010 and 2012 SRE show that, although the system based on PLLR features does not reach state-ofthe-art performance, including it in a fusion with a traditional acoustic based system (trained on MFCC features) provides remarkable gains in performance (among the best reported in the NIST 2012 SRE telephone without added noise condition), revealing a fruitful way of using acoustic-phonetic information for speaker recognition.
منابع مشابه
New insight into the use of phone log-likelihood ratios as features for language recognition
Phone Log-Likelihood Ratio (PLLR) features have been recently introduced as an effective way of making use of frame-level phone posteriors in language and speaker recognition systems. In this paper, a deep insight into PLLR features is made and further evidence of the usefulness of these features in spoken language recognition tasks is provided, with a new set of experiments carried out on the ...
متن کاملUtterance Verification Using State-Level Log-Likelihood Ratio with Frame and State Selection
This paper suggests utterance verification system using state-level log-likelihood ratio with frame and state selection. We use hidden Markov models for speech recognition and utterance verification as acoustic models and anti-phone models. The hidden Markov models have three states and each state represents different characteristics of a phone. Thus we propose an algorithm to compute state-lev...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملLanguage Recognition on Albayzin 2010 LRE using PLLR features
Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on si...
متن کاملLanguage Recognition on Albayzin 2010 LRE using PLLR features Reconocimiento de la Lengua en Albayzin 2010 LRE utilizando caracteŕısticas PLLR
Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on si...
متن کامل